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Abstract — In this paper, we deal with low-complexity near- 
optimal detection/equalization in large-dimension multiple-input 
multiple-output inter-symbol interference (MIMO-ISI) channels 
using message passing on graphical models. A key contri- 
bution in the paper is the demonstration that near-optimal 
performance in MIMO-ISI channels with large dimensions can 
be achieved at low complexities through simple yet effective 
simplifications/approximations, although the graphical models 
that represent MIMO-ISI channels are fully/densely connected 
(loopy graphs). These include 1) use of Markov Random Field 
(MRF) based graphical model with pairwise interaction, in 
conjunction with message/belief damping, and 2) use of Factor 
Graph (FG) based graphical model with Gaussian approximation 
of interference (GAI). The per-symbol complexities are 0{K^n1) 
and 0{Knt) for the MRF and the FG with GAI approaches, 
respectively, where K and rtt denote the number of channel uses 
per frame, and number of transmit antennas, respectively. These 
low-complexities are quite attractive for large dimensions, i.e., 
for large Knt. From a performance perspective, these algorithms 
are even more interesting in large-dimensions since they achieve 
increasingly closer to optimum detection performance for increas- 
ing Knt. Also, we show that these message passing algorithms can 
be used in an iterative manner with local neighborhood search 
algorithms to improve the reliability/performance of M-QAM 
symbol detection. 

Index Terms — MIMO-ISI channels, severe delay spreads, large 
dimensions, low-complexity detection, graphical models, Markov 
random fields, pairwise interaction, factor graphs. 



I. Introduction 

Signaling in large dimensions can offer attractive benefits in 
wireless communications. For example, transmission of signals 
using large spatial dimensions in multiple-input multiple- 
output (MIMO) systems with large number of transmit/receive 
antennas can offer increased spectral efficiencies |[Tl-||3l. The 
spectral efficiency in a V-BLAST MIMO system is rit sym- 
bols per channel use, where nt is the number of transmit 
antennas |3|. Severely delay-spread inter-symbol interference 
(ISI) channels can offer opportunities to harness rich diversity 
benefits flU. In an L-length ISI channel, each symbol in a 
frame is interfered by its previous L — 1 symbols. However, the 
availability of L copies of the transmitted signal in ISI chan- 
nels can be exploited to achieve Lth order diversity. A way to 
achieve this diversity is to organize data into frames, where 
each frame consists of K channel uses (i.e., K dimensions in 
time), K > L, and carry out joint detection/equalization over 
the entire frame at the receiver A MIMO-ISI channel with 
large Knt and L (referred to as large-dimension MIMO-ISI 
channel) is of interest because of its potential to offer high 
spectral efficiencies (in large nt) and diversity orders (in large 



zQ). A major challenge, however, is detection complexity. The 
complexity of optimum detection is exponential in number 
of dimensions, which is prohibitive for large number of 
dimensions. Our focus in this paper is to achieve near-optimal 
detection performance in large dimensions at low complexities. 
A powerful approach to realize this goal, which we investigate 
in this paper, is message passing on graphical models. 

Graphical models are graphs that indicate inter- 
dependencies between random variables ifTOl . Well known 
graphical models include Bayesian belief networks, factor 
graphs, and Markov random fields [11 1. Belief propagation 
(BP) is a technique that solves inference problems using 
graphical models BP is a simple, yet highly effective, 
technique that has been successfully employed in a variety 
of applications including computational biology, statistical 
signal/image processing, data mining, etc. BP is well suited in 
several communication problems as well [10|; e.g., decoding 
of turbo codes and LDPC codes llT2l . |fT3l . multiuser detection 
in CDMA 1 14 1-[ 16 1, and MIMO detection [T71-|20|. 

Turbo equalization which performs detection/equalization 
and decoding in an iterative manner in coded data transmission 
over ISI channels have been widely studied |l2T),['22l,f23l. 
More recently, message passing on factor graphs based graph- 
ical models 1241 have been studied for detection/equalization 
on ISI channels pZSl-fSOl. In ll27i . it has been shown through 
simulations that application of sum-product (SP) algorithm 
to factor graphs in ISI channels converges to a good ap- 
proximation of the exact a posteriori probability (APP) of 
the transmitted symbols. In [281, the problem of finding the 
linear minimum mean square error (LMMSE) estimate of 
the transmitted symbol sequence is addressed employing a 
factor graph framework. Equalization in MIMO-ISI channels 
using factor graphs are investigated in [i29l . ll3(jl . In 1291 . 
variable nodes of the factor graph correspond to the transmitted 
symbols, and each channel use corresponds to a function 
node. Since the received signal at any channel use depends on 
the past L symbols transmitted from every transmit antenna, 
every function node is connected to Lnt variable nodes. Near- 
MAP (maximum a posteriori probability) performance was 
shown through simulations for nt = 2 systems. However, the 
complexities involved in the computation of messages at the 
variable and function nodes are exponential in Lnt, which are 

'a practical example of severely delay-spread ISI channel with large L is 
an ultra wideband (UWB) channel (5|. UWB channels are highly frequency- 
selective, and are characterized by severe ISI due to large delay spreads 
|6|-|9|. The number of multipath components (MPC) in such channels in 
indoor/industrial environments has been observed to be of the order of several 
tens to hundreds; number of MFCs ranging from 12 to 120 are common in 
UWB channel models (6l,(9|. 
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prohibitive for large spatial dimensions and delay spreads. In 
UOI, a Gaussian approximation of interference is used which 
significantly reduced the complexity to scale well for large 
L. However, in terms of performance, the algorithm in ll30l 
exhibited high error floor^. 

Our key contribution in this paper is the demonstration that 
graphical models can be effectively used to achieve near- 
optimal detection/equalization performance in large-dimension 
MIMO-ISI channels at low complexities. The achieved perfor- 
mance is good because detection is performed jointly over 
the entire frame of data; i.e., over the full Kuf x 1 data 
vector While simple approximations/simplifications resulted 
in low complexities, the large-dimension behavio^ natural in 
message passing algorithms contributed to the near-optimal 
performance in large dimensions. The graphical models we 
consider in this paper are Markov random fields (MRF) and 
factor graphs (FG). We show that these graphical models 
based algorithms perform increasingly closer to the optimum 
performance for increasing rit and increasing values of K and 
L, keeping L/K fixed. 

In the case of MRF approach (Sectionllllli. we show that the 
use of damping of messages/beliefs, where messages/beliefs 
are computed as a weighted average of the message/belief 
in the previous iteration and the current iteration (details and 
associated references given in Section IIII-Db . is instrumental 
in achieving good performance. Simulation results show that 
the MRF approach exhibits large-dimension behavior, and 
that damping significantly improves the bit error performance 
(details given in Section IIII-Fl i. For example, the MRF based 
algorithm with message damping achieves close to unfaded 
single-input single-output (SISO) AWGN performance (which 
is a lower bound on the optimum detector performance) within 
0.25 dB at 10"^ bit error rate (BER) in a MIMO-ISI channel 
with rit = Ur — A, K — 100 channel uses per frame (i.e., 
problem size is Knt ~ 400 dimensions), and L = 20 equal- 
energy multipath components (MPC). Similar performances 
are shown for large-MIMO systems with rit — rir = 16, 32 
and K = 64 (problem size Knt — 1024 and 2048 dimen- 
sions). The per-symbol complexity of the MRF approach is 
0{K^nf) (details in Section HITEI i. 

In the case of FG approach (Section IIVK the Gaussian 
approximation of interference (GAI) we adopt is found to 
be effective to further reduce the complexity by an order 
(Section II V- Al l: i.e., the per-symbol complexity of the FG 
with GAI approach is just 0{Knt), which is one order less 
than that of the MRF approach. The proposed FG with GAI 
approach is also shown to exhibit large-dimension behavior; 
its BER performance is almost the same as that of the MRF 
approach, and is significantly better than that of the scheme 
in ll30l (Section lIV-Bl l. We also show that the proposed FG 
with GAI algorithm can be used in an iterative manner with 
local neighborhood search algorithms, like the reactive tabu 

-Figure [T4l shows an error floor in tlie approacii in |30|. Whereas, in the 
same figure, our FG approach in Sec. lIVl is seen to avoid flooring and perform 
significantly better 

^We say that an algoiithm exhibits 'large-dimension behavior' if its bit 
en'or peiformance improves with increasing number of dimensions. The fact 
that turbo codes with BP decoding achieve near-capacity performance only 
when the frame sizes are large is an instance of large-dimension behavior 



search (RTS) algorithm in 1(341 , to improve the performance 
of A/-QAM detection (Section |V]l. 

Though the proposed algorithms are presented in the context 
of uncoded systems, they can be extended to coded systems 
as well, through turbo equalization f2T|-|'23l (Receiver C 
in Fig. 1 of [23 J ) or through joint processing of the entire 
coded frame using low-complexity graphical models (low- 
complexity approximations of Receiver A in Fig. 1 of [23]) . In 
|fT9l , we have investigated a scheme with separate MRF based 
detection followed by decoding (Receiver B is Fig. 1 of |23l ) 
in a 24 X 24 large-MIMO system, and showed that a coded 
BER performance close to within 2.5 dB of the theoretical 
ergodic MIMO capacity is achieved. MIMO space-time coding 
schemes that can achieve separability of detection and de- 
coding without loss of optimality [43 [ are interesting because 
they avoid the need for joint processing for optimal detection 
and decoding. If such detection-decoding separable space-time 
codes become available for large dimensions, the proposed 
algorithms can be applicable in their detection/equalization. 

The rest of the paper is organized as follows. In Section Ull 
we present the considered MIMO system model in frequency 
selective fading. In Section Hill we present the proposed MRF 
based BP detector with damping and its BER performance 
in large dimensions. Section |IV] presents the FG with GAI 
based BP detector and its BER performance. In Section |V] 
the proposed hybrid RTS -BP algorithm for detection of M- 
QAM signals and its performance are presented. Conclusions 
are presented in Section |Vl] 

II. System Model 

We consider MIMO systems with cyclic prefixed single- 
carrier (CPSC) signaling, where the overall MIMO channel 
includes an EFT operation so that the transmitted symbols 
are estimated from the received frequency-domain signal 
(also referred to as SC-EDE: single-carrier modulation with 
frequency-domain equalization) Il44l - P6l . Unlike OFDM sig- 
naling, CPSC signaling does not suffer from the peak to 
average power ratio (PAPR) problem. Also, CPSC with FD- 
MMSE equalizer performs better than OFDM at large frame 
sizes (large K) [46 [. We will see that our proposed BP based 
algorithms scale well for large dimensions in MIMO-CPSC 
schemes (large Knt) and perform significantly better than 
MIMO-CPSC with FD-MMSE equalizer as well as MIMO- 
OEDM with MMSE/ML equalizer 

Consider a frequency-selective MIMO channel with nt 
transmit and receive antennas as shown in Fig. [T] Let L 
denote the number of multipath components (MPC). Data is 
transmitted in frames, where each frame has K' channel uses, 
out of which data symbol vectors are sent in K channel uses 
K > L. These K channel uses are preceded by a cyclic prefix 
(CP) of length i - 1 channel uses so that K' = K + L - 1. 
In each channel use, an rif -length data symbol vector is 
transmitted using spatial multiplexing on nt transmit antennas. 
Let Xq S {±1}"' denote the data symbol vector transmitted 
in the qth channel use, q = 0,1, ■ ■ ■ , K — 1. Though the 
symbol alphabet used here is BPSK, extensions to higher- 
order alphabet are possible, and some are discussed later in 
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Fig. L MIMO-ISI Channel Model. 

the paper. While CP avoids inter-frame interference, there will 
be ISI within the frame. The received signal vector at time q 
can be written as 

Yq = ^H/x,_;+Wq, q^Q,- ■ ■ ,K -I, (1) 



where e 



1=0 



for the Ixh MPC such that ijj'' denotes the entry on the j\h 
row and ith column of the H; matrix, i.e., H^P^ is the channel 
from ith transmit antenna to the jt\\ receive antenna on the /th 
MPC. The entries of H; are assumed to be i.i.d C7V(0, 1). It 
is further assumed that H;, / = 0,-- - ,L — 1 remain constant 
for one frame duration, and vary i.i.d from one frame to the 
other Wg G C"'' is the additive white Gaussian noise vector 
at time q, whose entries are independent, each with variance 
cr^ = utLEs/'j, where 7 is the average received SNR per 
received antenna. The CP will render the linearly convolving 
channel to a circularly convolving one, and so the channel will 
be multiplicative in frequency domain. Because of the CP, the 
received signal in frequency domain, for the ith frequency 
index (0 < i < K ~ 1), can be written as 



is the channel gain matrix 



K-1 



(2) 



where r, = -7= e ^ Y?, u» = 7^ E e '< x,, 

K-1 _2^j,i L-1 _2^jH 

= ^ E e—S^Wq, Gi = J2 e~^H;, and j = 
^ 9=0 ;=o 
V— 1- Stacking the K vectors ri, i — 0, ■ ■ ■ , K — 1, we write 



GF Xe//+Ve//, 



(3) 
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", T>K is the ii'-point DFT matrix and 



where pg ^ = e 

denotes the Kronecker product. Equation (O can be written in 



an equivalent linear vector channel model of the form 
r = Hx + V, 



(4) 



where H = Hg//, x = Xe//, and v = Vg//. Note that 
the well known MIMO system model for flat fading can be 
obtained as a special case in the above system model with 
K=l. 

We further note that, in the considered system, signaling is 
done along K dimensions in time and rit dimensions in space, 
so that the total number of dimensions involved is Krit- We 
are interested in low-complexity detection/equalization in large 
dimensions (i.e., for large Krit) using graphical models. The 
goal is to obtain an estimate of vector x, given r and the 
knowledge of H. The optimal maximum a posteriori proba- 
bility (MAP) detector takes the joint posterior distribution 



p(x|r,H) cx p(r|x,H)p(x) 



(5) 



and marginalizes out each variable as p(a;i|r, H) = 
E p(x|r, H), where X-i stands for all entries of x except 



P[Xi 



a r 



H), 



(6) 



Xi. The MAP estimate of the bit Xi, i — 1, • • • , Krit, is then 
given by 

^ arg max 

" aG{±l} 

whose complexity is exponential in Knt- In the following sec- 
tions, we present low-complexity detection algorithms based 
on graphical models suited for the system model in (|4|i with 
large dimensions, i.e., for large K, L, nt, keeping L/K fixed. 

III. Detection Using BP on Markov Random Fields 

In this section, we present a detection algorithm based on 
message passing on a MRP graphical model of the MIMO 
system model in Q lISTll . 

A. Markov Random Fields 

An undirected graph is given by G = (V, E), where 
V is the set of nodes and E C -hj^Vjiy^j} 
is the set of undirected edges. An MRF is an undirected 
graph whose vertices are random variables ll35]| . lfT0l . The 
statistical dependency among the variables are such that any 
variable is independent of all the other variables, given its 
neighbors. Usually, the variables in an MRF are constrained 
by a compatibility function, also known as a clique potential in 
literature. A clique of an MRF is a fully connected sub-graph, 
i.e., it is a subset C CV such that {i,j) G E for all i,j G C. 
A clique is maximal if it is not a strict subset of another clique. 
Therefore, a maximal clique does not remain fully connected 
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Fig. 2. An example of MRF. 



if any additional vertex of the MRF is included in it. For 
example, in the MRF shown in Fig. |2] {xi, X2, a^s, X4} and 
{x3,X4,X5} are two maximal cliques. 

Let there be Nc maximal cliques in the MRF, and be 
the variables in maximal clique j. Let tpj (x^) be the clique 
potential of clique j. Then the joint distribution of the variables 
is given by Hammersley-Clifford theorem 1,47 J 



N, 



p(x) 



In* 



(7) 



where Z is a constant, also known as partition function, chosen 
to ensure the distribution is normalized. In Fig. |2] with two 
maximal cliques in the MRF, namely, {xi, X2, 2:3, 2:4} and 
{x3, X4, x^}, the joint probability distribution is given by 

p(x) = ^'tlji{xi,X2,X3,X4)i'2{x3,X4,X5) . (8) 

Pairwise MRF: An MRF is called a pairwise MRF if all the 
maximal cliques in the MRF are of size two. In this case, the 
clique potentials are all functions of two variables. The joint 
distribution in such a case takes the form ifTTI 

p(x) cx ( J]^ (a;j,Xj)) (x,)), (9) 

where ipij {xi,Xj) is the clique potential between nodes Xi 
and Xj denoting the statistical dependence between them, and 
(pi (xi) is the self potential of node Xi. 



B. MRF ofMIMO System 



The MRF of a MIMO system is a fully connected graph. 
Figure [3] shows the MRF for a 8 x 8 MIMO system. We get 
the MRF potentials for the MIMO system where the posterior 
probability function of the random vector x, given r and H, 
is of the forrr0 



*In our detection problem, relative values of the distribution for various 
possibilities of x are adequate. So, we can omit the normalization constant 
Z, which is independent of x, and replace the equality with proportionality 
in the distribution. 




Fig. 3. Fully connected MRF of 8 X 8 MIMO system. 

p(x|r, H) cx exp ^^-^||r — Hx|p^ exp ( Inp(x)) 
= exp(-^(r-Hx)^(r-Hx) 
• J|exp (inp(xO) 



exp -(x^H^Hx - 25R{x^H^r}) 



•]^exp (lnp(x,)). 



(10) 



Now, defining R = ^H^H and z = ^H^r, we can write 
dH as 

p(x|r, H) oc exp — ^{x*RijXj}) 

■ exp ^ ^ 5R{a;*2i}j exp ( \-ap{xi)) 

i i 

nexp(-x,»{i?,j}a;, n | ]^exp(a;,sR{z.} + Inp(sO)], (H) 



where Zi and Rij are the elements of z and R, respectively. 
Comparing (HI]) and Q, we see that the MRF of the MIMO 
system has only pairwise interactions with the following 
potentials 

^ij{xi,Xj) = exp (^-Xi'^{Rtj}xj^, (12) 
(j)i{xt) = exp (^Xt'Si{zi} + liip{xi)j. (13) 



C. Message Passing 

The values of tp and (p given by ( fT2] l and ( fT3]) define, 
respectively, the edge and self potentials of an undirected 
graphical model to which message passing algorithms, such 
as belief propagation (BP), can be applied to compute the 
marginal probabilities of the variables. BP attempts to estimate 
the marginal probabilities of all the variables by way of passing 
messages between the local nodes. 
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A message from node j to node i is denoted as mj,i [xi), 
and belief at node i is denoted as hi{xi), xi £ {±1}- The 
hi{xi) is proportional to how likely Xi was transmitted. On the 
other hand, mji{xi) is proportional to how likely Xj thinks Xi 
was transmitted. The belief at node i is 

hi {xi) cx 0i [xi] J]^ m.j^i {xi) , (14) 

where N{i) denotes the neighboring nodes of node i, and the 
messages are defined as IfTTI 

mj^i{xi) cx {xj)'ipj.i{xj,Xi) Y\_ rukj (xj) . (15) 

Xj keAfU)\i 

Equation ( fTSl l actually constitutes an iteration, as the mes- 
sage is defined in terms of the other messages. So, BP 
essentially involves computing the outgoing messages from a 
node to each of its neighbors using the local joint compatibility 
function and the incoming messages and transmitting them. 
The algorithm terminates after a fixed number of iterations. 

D. Improvement through Damping 

In systems characterized by fully/highly connected graph- 
ical models, BP based algorithms may fail to converge, and 
if they do converge, the estimated marginals may be far from 
exact |[36l . ll37l . It may be expected that BP might perform 
poorly in MIMO graphs due to the high density of connections. 
However, several methods are known in the literature, in- 
cluding double loop methods |38|,[39| and damping 1401 . BTl 
which can be applied to improve things if BP does not 
converge (or converges too slowly). In this paper, we consider 
damping methods. 

In |;40] , Pretti proposed a modified version of BP with 
over-relaxed BP dynamics. At each step of the algorithm, 
the evaluation of messages is taken to be a weighted average 
between the old estimate and the new estimate. The weighted 
average could either be applied to the messages (resulting in 
message damped BP) or to the estimate of the probability 
distribution/beliefs of the variables {probability/belief damped 
BP), or to both messages and beliefs (hybrid damped BP). It is 
shown, in |40|, that the probability damped BP can be derived 
as a limit case in which the double-loop algorithm becomes a 
single-loop one. 

Message Damped BP: Denoting fhfj{xj) as the updated 
message in iteration t obtained by message passing, the new 
message from node i to node j in iteration t, denoted by 
m|*j(xj), is computed as a convex combination of the old 
message and the updated message as 

w-j (xj) cx ^ (x,) V'.^j (a;^, Xj) Y[ ™fe,7 (a^O ' (16) 

= + (1 - "m) rhf]{Xj), (17) 

where am € [0, 1) is referred as the message damping factor. 

Belief Damping: Instead of damping the messages in each 
iteration, the beliefs of the variables can be computed in each 
iteration as a weighted average, as 



hf{x,) cx ux,) n (18) 

ie.V(i) 

hf\xi) - abhf-^\xi) + {l-ab)hf{xi), (19) 

where ab G [0, 1) is referred to as the belief damping factor. 

Hybrid Damping: As a more general damping strategy, we 
can update both the messages as well as the beliefs according 
to ( fTTI l and ( fT9] l. respectively, in each iteration. Different 
combinations of (a™, ab) values specializes to different strate- 
gies; for e.g., {am — ab = Q) corresponds to Undamped 
BP, (am ^ Q,ab — 0) corresponds to Message damped BP, 
(am — 0, Qt / 0) corresponds to Belief damped BP, and 
(cim / 0, Ob / 0) corresponds to Hybrid damped BP. 

The proposed BP algorithm employing damping is listed in 
Table |T] 

E. Computation Complexity 

The per-symbol complexity of calculating messages and 
beliefs in a single BP iteration is 0{K'^n^) and 0{Knt), re- 
spectively. Likewise, the per-symbol complexity of computing 
(j) and ijj is 0(1) and 0{Knt), respectively. The computation 
of z can be carried out with 0{Knr) per-symbol complexity. 
The computation of R involves computation of H^H, which 
involves three operations: i) computation of G, ii) calculation 
of G^G, and iii) multiplication of and F with G^G. 
The computation i) involves iC-point FFT of matrices Hi, I — 
0, • • • , i — 1, each Hi of dimension Ur x rit. The complexity 
associated with this operation is 0{ntnrK\og2K). The total 
number of symbols transmitted is Krit. So, the per-symbol 
complexity is 0{nr log2 K). The computation ii) involves the 
calculation of G^^G^ for i — 0, - ■ ■ ,K—1. The computation 
of each G^^G^ has complexity 0(rif ). Due to block-diagonal 
structure of G, K such computations can be done in 0{Knf) 
complexity, leading to a per-symbol complexity of 0{n^). 
Likewise, due to the block-symmetric structure of F, the 
per-symbol complexity corresponding to computation iii) is 
0{Kn1). Since the number of BP iterations is much less than 
Knt, the overall per-symbol complexity is of the proposed 
MRF based BP detection algorithm is given by 0{K^n^), 
which scales well for large Knt. 

F. Simulation Results 

In this section, we present the simulated BER performance 
of the proposed MRF BP detection algorithm. 

Performance in Flat-Fading with Large fit: In Figs. |4]to|6] 
we illustrate the iarge-dimension behavior' of the algorithm 
and the effect of damping for large number (tens) of transmit 
and receive antennas with BPSK modulation on flat fading 
channels (i.e., L — K = 1). The number of BP iterations 
is 5. Figure |4] shows the variation of the achieved BER as 
a function of the message damping factor, am, in 16 x 16 
and 24 x 24 V-BLAST MIMO systems at an average received 
SNR per receive antenna, 7, of 8 dB. Note that a^ ~ 
corresponds to the case of undamped BP. It can be observed 
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Initialization 

1. m(°)(a:,)=bf)(a:,) = 0.5, 

p{xi = 1) ^p{xi = -1) =0.5, yi,j = !,■■■ ,Knt 

2. mf]{xj)=lf{xi) = 0.5, V^,J = 1, • • • 

3. z = ^H^r; R = ^ H^H 

4. for i = 1 to Knt 

5. = exp {xi'?k{zi} + ln(p(xj))) 

6. end for 

7. for i = 1 to i^rif 

8. for j = 1 to Knt, i ^ i 

9. Vjj (xiyXj) = exp ( - Xi^{Rij}xj) 

10. end for 

1 1 . end for 

Iterative Update of Messages and Beliefs 

12. for t = 1 to num_iter 
Damped Message Calculation 

13. for i = 1 to Knt 

14. for j = I to Knt, j ^ i 

15. rn(^j{xj) oc Y.,^, (t)i{xi)il)i^j{x„Xj) 

'T{keN'{t)\j^k,i (^i) 

16. mf] (xj) = a„, mfj^^ {xj) + (1 - a„) m|*] (xj) 

17. end for 

18. end for 

Damped Belief Calculation 

19. for i = 1 to Knt 

20. hf{xi) cx 4>^{x,) njGAr(») 

21. (x,) cx abh^'~^\x,) + (1 - a6)bf \x,) 

22. end for 

23. end for; End of for loop starting at line 12 

24. X, = ^^^^^^^ ^^urn^er) ^^^^ ^ y z = 1 , • • • , 

25. Terminate 

TABLE I 

Proposed MRF Based BP Detector/Equalizer Algorithm. 

from Fig. |4] that, depending on the choice of the value of 
am, message damping can significantly improve the BER 
performance of the BP algorithm. There is an optimum value 
of Urn at which the BER improvement over no damping case 
is maximum. For the chosen set of system parameters in 
Fig. m the optimum value of am is observed to be about 
0.2. For this optimum value of am = 0.2, it is observed 
that about an order of BER improvement is achieved with 
message damping compared to that without damping. From 
Fig. m it can further be seen that the performance improves 
for increasing nt = nr (i.e., performance of the nt — nr — 24 
system is better that of the nt — nr — 16 system). This 
shows that the algorithm exhibits 'large-dimension behavior,' 
where the BER performance moves closer towards unfaded 
SISO AWGN performance when nt = nr is increased from 
16 to 24. This large-dimension behavior is illustrated even 
more clearly in Fig. |5] where we plot the BER performance 
of V-BLAST MIMO as a function of SNR for different 
nt = nr = 4, 8, 16, 24 and 32 for = 0.2. 

In Fig.|6l we present a comparison of the BER performance 
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Fig. 4. BER performance of ttie MRF BP algorithm as a function of message 
damping factor, a,„, in V-BLAST MIMO with nt = rir = 16, 24 on flat 
fading (L = = 1) at 8 dB SNR. # BP iterations=5. 
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Fig. 5. BER performance of the MRF BP algorithm as a function of SNR 
in V-BLAST MIMO for different m = rir on flat fading (L = K = 1) wifli 
message damping, am = 0.2, and # BP iterations = 5. 



achieved using message damping, belief damping and hybrid 
damping based BP detection of 8 x 8 non-orthogonal space- 
time block code (STBC) from cyclic division algebra (CDA) 
with t ^ e\ 5 = e^j lH at 8 dB SNR. In this type of 
STBC, each STBC is a rit x p square matrix with nt transmit 
antennas and p — nt time slots constructed using n^ symbols, 
which results in dimensions and nt symbols per channel 
use. For message damping and belief damping, a„i and ab 
are varied in the range to 1. For hybrid damping, we set 
dm = Oib and varied it in the range to 1. From Fig. |6] it can 
be seen that i) with damping, there is an optimum value of the 
damping factor at which the BER performance is the best (e.g., 
for message damping, the optimum damping factor is about 
0.3 in Fig.|6]l, ii) message damping performs better than belief 
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Fig. 6. Effect of message, belief, and liybrid damping on tlie BER 
performance of 8 X 8 STBC from CDA with t = eK & = K nt = rir = % 
on flat fading (L = K = I) ai % SNR. MRF BP, # BP iterations = 5, 
am = ctij for hybrid damping. 
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Fig. 7. BER performance of the MRF BP algorithm as a function of the 
message damping factor, am, in MIMO-ISI channels, nt = rir = 4, [L = 
10, K = 50], uniform power delay profile, average received SNR = 6 dB, # 
BP iterations = 7. 

damping for small values of the damping factor, whereas belief 
damping performs better at high values of the damping factor; 
however, over the entire range of the damping factor, the 
best performance of message damping is significantly better 
than the best performance of belief damping, and Hi) for the 
chosen condition of am = ctb, hybrid damping performance 
is similar to that of message damping; however, am and at, in 
hybrid damping can be jointly optimized to further improve 
the performance. 

Performance in MIMO-ISI Channels with Large Knt: In 
Fig. |7] we explore the effect of message damping on the 
BER performance of the MRF based BP detector/equalizer 
in MIMO-ISI channels. In all the simulations of MIMO-ISI 



channels, we have taken uniform power delay profile (i.e., 
all the L paths are assumed to have equal energy). Figure 
I?] shows the variation of the achieved BER as a function of 
the message damping factor, am, for nt = rir — 4, BPSK, 
[L = 10, K = 50], at an average received SNR of 6 dB. 
The total number of dimensions, Knt — 200. The number of 
BP iterations used is 7. From Fig. |7] it is can be seen that 
damping can significantly improve the BER performance of 
the BP algorithm. For the chosen set of system parameters in 
Fig- III the optimum value of am is observed to be about 0.45, 
which gives about an order of BER improvement. This point 
of the benefit of damping in terms of BER performance (and 
also in terms of convergence) is even more clearly brought 
out in Fig. [8] where we have compared the BER performance 
without damping (am = 0) and with damping («„ = 0-45) 
for [L ^ 20,K ^ 100] at an SNR of 7 dB as a function 
of the number of BP iterations. It is interesting to see that 
without damping (i.e., with a.,„ — 0), the algorithm indeed 
shows 'divergence' behavior, i.e., BER increases as number of 
iterations is increased beyond 4. Such divergence behavior is 
effectively removed by damping, as can be seen from the BER 
performance achieved with a,,, = 0.45. Indeed the algorithm 
with damping (a„i ~ 0.45) is seen to be converge smoothly. 
It is also interesting to note that the algorithm converges to a 
BER which is quite close to the unfaded SISO AWGN BER 
(BER on SISO AWGN at 7 dB SNR is about 7.8 x lO""* 
and the converged BER using damped BP is about 1 x 10~^). 
This illustrates the potential of damping in improving BER 
performance and convergence of the algorithm when employed 
for detection/equalization in the considered MIMO system 
on severely delay spread frequency-selective channels (e.g., 
L = 20). It is also noted that damping (as per Eqn. ( fTTI l) does 
not increase the order of complexity of the algorithm without 
damping; the order of complexity without and with damping 
remains the same. 

Comparison with MIMO-OFDM Performance: In Fig. |9] 
we present a performance comparison between the considered 
MIMO-CPSC scheme and a MIMO-OFDM scheme for the 
same system/channel parameters in both cases; for nt — nr — 
4 and following combinations of L and K: [L — b,K = 2b\, 
[L = 10, i<r = 50], [L = 20,K = 100]. For MIMO-CPSC, two 
detection schemes are considered: FD-MMSE and proposed 
MRF BP. For the MRF BP, number of BP iterations used is 
10 and the value of a„, used is 0.45. For MIMO-OFDM, 
two detection schemes, namely, MMSE and ML detection 
on each subcarrier are considered. We have also plotted the 
unfaded SISO AWGN performance that serves as a lower 
bound on the optimum detection performance. The following 
observations can be made from Fig. |9l i) MIMO-OFDM with 
MMSE detection performs the worst among all the considered 
system/detection configurations, ii) MIMO-CPSC with FD- 
MMSE performs better than MIMO-OFDM with MMSE (this 
better performance in CPSC is in line with other reported 
comparisons between OFDM and CPSC, e.g., Il44ll.||45l.|46l). 
Hi) at the expense of increased detection complexity, MIMO- 
OFDM with ML detection performs better than both MIMO- 
OFDM with MMSE and MIMO-CPSC with FD-MMSE, and 
iv) more interestingly, MIMO-CPSC with the low-complexity 
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Fig. 8. Comparison of the BER performance of message damped and 
undamped MRF BP detector/equalizer as a function of number of BP iterations 
in MIMO-ISI channels, nt = rir = 4, [L = 20,K = 100], uniform 
power delay profile, average received SNR = 7 dB, am = (undamped), 
Om = 0.45 (damped). 




Fig. 9. BER performance of message damped MRF BP detector/equalizer as 
a function of average received SNR in MIMO-ISI channels with nt = = 4 
for different values of L and K keeping L/K constant: [L = 5, K = 25], 
[L = 1Q,K = 50], and [L = 20,K = 100]. Uniform power delay profile. 
# BP iterations = 10, Om = 0.45. 



IV. Detection using BP on Factor Graphs with 
Gaussian Approximation of Interference 

In this section, we present another low-complexity algo- 
rithm based on BP for detection in large-dimension IVIIIVIO- 
ISI channels. The graphical model employed here is factor 
graphs. A key idea in the proposed factor graph approach 
which enables to achieve low-complexity is the Gaussian 
approximation of interference (GAI) in the system. 

Consider the MIIVIO system model in (01). We will treat 
each entry of the observation vector r as a function node 
(observation node) in a factor graph, and each transmitted 
symbol as a variable node. The received signal can be 
written as 

Knt 

Ti — ^ ^ hij Xj H~ Vi 

i=i 

Knt 

= hikXk + ^ hijXj + Vi. (20) 

Inter f erence 

When computing the message from the ith observation node 
to the fcth variable node, we make the following Gaussian 
approximation of the interference; 



Knt 



Ti = hikXk + ^ hijXj + 



(21) 



where the interference plus noise term, Zik, is modeled as 

CAA(M..,,a2 ) with 



Knt 

hijE{Xj), 

3 = id^k 



Knt 

E I 



' Var(a;j) + cr^ 



(22) 



(23) 



For BPSK signaling, the log-likelihood ratio (LLR) of the 
symbol Xk € {-1-1,— 1} at observation node i, denoted by 
A^, can be written as 



A'' 



Ids 



. p(r,|H,Xfc = 1) 
' p(ri|H, Xk = -1) 

-3? (/i*fc(r, - ^^^J) 



(24) 



IVIRF BP detection significantly outperforms IVIBVIO-OFDIVI 
even with IVIL detection. Indeed, the performance of the 
IMIIVIO-CPSC with MRF BP detection gets increasingly closer 
to the SISO AGWN performance for increasing L, K, keeping 
L/K constant. For example, the gap between the MRF BP 
performance and the SISO AWGN performance is only about 
0.25 dB for L = 20 at a BER of lO'^. This illustrates 
the ability of the MRF BP algorithm to achieve near-optimal 
performance for severely delay spread MIMO-ISI channels 
(i.e., large L) as witnessed in UWB systems. 



The LLR values computed at the observation nodes are 
passed to the variable nodes (Fig. [TOk). Using these LLRs, 
the variable nodes compute the probabilities 



A 



Piixk = +l|r) 



exp(E^u^z Af)' 



(25) 



and pass them back to the observation nodes (Fig. [TOb). 
This message passing is carried out for a certain number of 
iterations. Messages can be damped as described in Section 
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Eqn. (24),(22),(23) 




(a) 




P^ = ff({Af } J 7^ 1) 



'2 















Eqn. (25) 




(b) 



Fig. 10. Message passing between variable nodes and observation nodes. 



IIII-DI and then passed. Finally, Xk is detected as 

Kr 

Xk = sgr 



(26) 



Note that approximating the interference as Gaussian greatly 
simplifies the computation of messages (as can be seen from 
the complexity discussion in the following subsection.) 

A. Computation Complexity 

The computation complexity of the FG-GAI BP algorithm 
in the above involves i) LLR calculations at the observation 
nodes as per (l24l l. which has 0{K^ntnr) complexity, and ii) 
calculation of probabilities at variable nodes as per ( |25] l. which 
also requires 0{K^ntnr) complexitjQ. Hence, the overall 
complexity of the algorithm is 0{K^ntnr) for detecting 
Knt transmitted symbols. So the per-symbol complexity is 
just 0{Knt) for nt = Ur- Note that this complexity is 
one order less than that of the MRF based approach in the 
previous section. Because of its linear complexity in K and 
fit, the proposed FG approach with GAI is quite attractive for 
detection in large-dimension MIMO-ISI channels. In addition, 
the BER performance achieved by the algorithm in large 
dimensions is very good (as shown in the BER performance 
results in the following subsection). 

B. Simulation Results 

Figure nn shows the simulated BER performance of the FG- 
GAI BP algorithm in nt x Ur V-BLAST MIMO with rit = 
Ur = 8, 16, 24, 32, 64 and BPSK on flat fading (L = K = 1). 
The number of BP iterations and and message damping factor 

naive implementation of )24t would require a summation over Knt — 1 
variable nodes for each message, amounting to a complexity of order 
OiK'^nlrir). However, the summation over Knt — 1 variables in )22t can be 
written in the form ^^ij^i^j) ~ hn:^{xi^), where the computation of 

the full summation from j = 1 to Knt (which is independent of the variable 
index k) requires Knt — 1 additions. In addition, one subtraction operation for 
each k is required. The makes the complexity order for computing (22) to be 
only 0{K-^ntnr). A similar argument holds for computation of the variance 
in j23K and hence the complexity of computing the LLR in )24t becomes 
0{K^ntnr). Likewise, a similar rewriting of the summation in (25) leads to 
a complexity of 0{K^ntnr). 



used are 10 and 0.4, respectively. We observe that, like the 
MRF approach, the FG-GAI approach also exhibits large- 
dimension behavior; e.g., 32 x 32 and 64 x 64 V-BLAST 
systems perform close to unfaded SISO AWGN performance. 
Similar large-dimension behavior is shown in Fig. [12] in 
MIMO-ISI channels with L = 6 and if = 64 for tit = rv = 
4, 8, 16; i.e., BERs move increasingly closer to unfaded SISO 
AWGN BER for increasing Knt = 256, 512, 1024. Figure 
[T3] presents a comparison of the performances achieved by 
the MRF and FG-GAI approaches for the following system 
settings: = = 4, [L = 5,K ^ 25], [L ^20,K ^ 100], 
and BPSK. It can be seen that, for these system settings, the 
FG with GAI approach performs almost the same as the MRF 
approach, at one order lesser complexity than that of the MRF 
approach. 

Figure [14] presents a comparison of the performances 
achieved by the proposed scheme and the scheme in [30| for 
nt ^ Ur ^ A, [L ^ 4:,K = 400], and BPSK. It can be 
seen that while the scheme in li30l exhibits an error floor, the 
proposed scheme avoids flooring and achieves much better 
performance. Such good performance is achieved because 
equalization is done jointly on all the Knt symbols in a frame. 
The complexity of the scheme in |[30) is 0{Lnt), whereas 
the complexity of the proposed scheme is 0{Knt). Though 
K > L, the linear complexity of the proposed scheme in K 
is still very attractive. Also, as with MRF BP, the FG-GAI BP 
algorithm in MIMO-CPSC performs significantly better than 
MIMO-OFDM even with ML detection. 

V. Hybrid Algorithms Using BP and Local 
Neighborhood Search for i\/-QAM 

The BP algorithms proposed in the previous two sections 
are for BPSK modulation, i.e., for x e {±1}^"'. They 
can work for 4-QAM also by viewing the transmit symbol 
vector to be in {±1}^^"'. Low-complexity algorithms for 
detection/equalization for higher-order A/-QAM, M > 4, 
over large dimension MIMO-ISI channels are of interest. A 
BP based algorithm that is suited for higher-order QAM in 
MIMO has been reported recently in |49|. The algorithm in 
[49] uses a Gaussian tree approximation (GTA) to convert the 
fully-connected graph representing the MIMO system into a 
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Fig. 11 . BER performance of the FG-GAI BP algorithm in V-BLAST MIMO 
systems with m = n,- = 8, 16, 24, 32, 64 on flat fading (L = K = 1). # 
BP iterations = 20, am = 0.4. 



Fig. 13. Comparison of the BER performances of the MRF BP and FG-GAI 
BP algorithms in MIMO-ISI channels with nt = rir = i, [L = 5, K = 25], 
[L = 20, K = 100], uniform power delay profile. 
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Fig. 12. BER performance of the FG-GAI BP algorithm in MIMO-ISI 
channels with for [L = 6, K = 64] for nt = rir = 4, 8, 16. Uniform power 
delay profile, # BP iterations = 10, am = 0.4. 
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Fig. 14. Comparison of the BER performances of the FG-GAI BP scheme 
and the scheme in |30] in MIMO-ISI channels with nt = nr = 4, [L = 
4, K = 400], uniform power delay profile. 



tree, and cames out BP on the resultant approximate tree. 
We refer to this algorithm in |49| as the GTA BP algorithm. 
In this section, we take an alternate hybrid approach for 
efficient detection of 7\/-QAM signals, where the proposed 
FG-GAI BP algorithm for BPSK is used to improve the M- 
QAM detection performance of local neighborhood search 
algorithms. Simulation results (Fig.fTTIl show that the proposed 
hybrid approach performs better than the GTA BP approach 
in 1491 . 

Local Neighborhood Search Based Detection: Low com- 
plexity search algorithms that attempt to minimize the 
maximum-likelihood (ML) cost ||r — Hx|p, by limiting the 
search space to local neighborhood have been proposed for 
detection of M-QAM signals in MIMO - e.g., tabu search 
(TS) algorithm [i32l - lf34l . Such local neighborhood search 



algorithms have the advantage of low-complexity (e.g., TS al- 
gorithms, like the proposed MRF BP algorithm, has quadratic 
complexity in Krit), making them suited for large dimensions. 
However, their higher-order QAM performance is away from 
optimal performance. Here, we propose to improve the M- 
QAM performance of these search algorithms through the 
application of the proposed BP algorithms on the search 
algorithm outputs. This approach essentially improves the 
rehabihty of the output symbols from the local neighborhood 
search, thereby improving the overall BER performance. We 
apply this hybrid approach to the reactive tabu search (RTS) 
algorithm in 1 34) . 

Hybrid RTS-BP Approach: In the following subsections, we 
first present a brief summary of the RTS algorithm in 1341 and 
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the motivation behind the proposed hybrid approach. Next, 
we present the proposed hybrid RTS-BP algorithm and its 
BER performance. Finally, we present a method to reduce 
complexity based on the knowledge of the simulated pdf of 
the RTS algorithm output. 

A. Reactive Tabu Search (RTS) Algorithm 

Here, we present a brief summary of the RTS algorithm 
in ||34| . The RTS algorithm starts with an initial solution 
vector, defines a neighborhood around it (i.e., defines a set 
of neighboring vectors based on a neighborhood criteria), 
and moves to the best vector among the neighboring vectors 
(even if the best neighboring vector is worse, in terms of 
ML cost ||r — Hx|p, than the current solution vector); this 
allows the algorithm to escape from local minima. This process 
is continued for a certain number of iterations, after which 
the algorithm is terminated and the best among the solution 
vectors in all the iterations is declared as the final solution 
vector In defining the neighborhood of the solution vector 
in a given iteration, the algorithm attempts to avoid cycling 
by making the moves to solution vectors of the past few 
iterations as 'tabu' (i.e., prohibits these moves), which ensures 
efficient search of the solution space. The number of these 
past iterations is parametrized as the 'tabu period,' which is 
dynamically changed depending on the number of repetitions 
of the solution vectors that are observed in the search path 
(e.g., increase the tabu period if more repetitions are observed). 
The per-symbol complexity of the RTS algorithm is quadratic 
in Krit for rit = n^. 

B. Motivation for Hybrid RTS-BP Algorithm 

The proposed hybrid RTS-BP approach is motivated by the 
following two observations we made in our BER simulations 
of the RTS algorithm: i) the RTS algorithm performed very 
close to optimum performance in large dimensions for 4- 
QAM; however, its higher-order QAM performance is far from 
optimal, and ii) at moderate to high SNRs, when an RTS 
output vector is in error, the least significant bits (LSB) of the 
data symbols are more likely to be in error than other bits. An 
analytical reasoning for the second observation can be given 
as follows. 

Let the transmitted symbols take values from M-QAM 
alphabet A, so that x G A"* is the transmitted vector. Consider 
the real-valued system model corresponding to (|4]l, given by 
r' = H' x' + v', where 



H' 



n{H) -3(H) 
3(H) 5R(H) 



3?(x) 
5(x) 



r = 



5J(r) 
3(r) 



5R(v) 
3(v) 



(27) 



x' is a 2Knt x 1 vector; [x'l,--- ,a;'^„J can be viewed 
to be from an underlying M-PAM signal set, and so is 

WKnt+ir-- ,x'2Knt]- Let 1 = {ai,a2,--- ,aM} denote the 
M-PAM alphabet that takes its value from. 

Let x' denote the detected output vector from the RTS 
algorithm corresponding to the transmitted vector x'. Consider 



the expansion of the M-PAM symbols in terms of ±l's, 
where we can write the value of each entry of x' as a linear 
combination of ±l's as 



N-l 

E 

J=0 



2J b 



,2Knt, 



(28) 



where N = log^ M and Sp^ € {±1}. We note that the RTS 
algorithm outputs a local minima as the solution vector So, 
x', being a local minima, satisfies the following conditions: 



r'-H'S'f < ||r'-H'(2' + A.eOf , Vi = l,- 



,2Knt, (29) 



where Xi = (a, — x'j), q — 1, • • • , M, and e,; denotes the 

ith column of the identity matrix. Defining F' = H'^H' and 
denoting the ith column of H' as h^, the conditions in 
reduce to 



(30) 



where fij denotes the {i,j)th element of F'. Under moderate 
to high SNR conditions, ignoring the noise, ( |30] | can be further 
reduced to 



2(x'-x') fiSgn(Ai) < XifiiSgn{Xi), 



(31) 



where denotes the ith column of F'. For Rayleigh fading, 
fa is chi-square distributed with 2KNt degrees of freedom 
with mean KNt- Approximating the distribution of fij to be 
normal with mean zero and variance ^^-^ for i 7^ j by central 
limit theorem, we can drop the sgn(Ai) in ( [3TT i. Using the fact 
that the minimum value of \Xi\ is 2, ( [3T| i can be simplified as 



x'- ^x'- 



(32) 



where Aj — x'j — x'j. Also, if x'i = x[, by the normal 
approximation in the above 

E A./.- - AA(o,^^A,^). (33) 



x'.^x] 



x'.^x'. 



Now, the LHS in (132b being normal with variance proportional 
to and the RHS being positive, it can be seen that 
Ai, V« take smaller values with higher probability. Hence, 
the symbols of x' are nearest Euclidean neighbors of their 
corresponding symbols of the transmitted vector with high 
probabilit}|§ Now, because of the symbol-to-bit mapping in 
(l28T l. x[ will differ from its nearest Euclidean neighbors 
certainly in the LSB position, and may or may not differ in 
other bit positions. Consequently, the LSBs of the symbols in 
the RTS output x' are least reliable. 

The above observation then led us to consider improving the 
reliability of the LSBs of the RTS output using the proposed 
FG-GAI BP algorithm presented in Section |IV] and iterate 
between RTS and FG-GAI BP as follows. 



^Because x'^& and x'^i take values from A/-PAM alphabet, x'^ is said to 
be the Euclidean nearest neighbor of Xi if \x'^ =2. 
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Fig. 15. Hybrid RTS-BP algorithm. 

C. Proposed Hybrid RTS-BP Algorithm 

Figure [15] shows the block schematic of the proposed hybrid 
RTS-BP algorithm. The following four steps constitute the 
proposed algorithm. 

• Step 1: Obtain x' using the RTS algorithm. Obtain the 
output bits i 1, • • • , 2Knt, j = 0, • • • , iV - 1, 
from x' and ( |28] ). 

• Step 2: Using the b^^'s from Step 1, reconstruct the 
interference from all bits other than the LSBs (i.e., 



interference from all bits other than fo-^^'s) as 



I = ^ 2^ H' b 



(34) 



where b^-') 



structed interference in (l34l l from r as 



r. Cancel the recon- 



(35) 



Step 3: Run the FG-GAl BP algorithm in Section HV] on 
the vector r' in Step 2, and obtain an estimate of the 
LSBs. Denote this LSB output vector from FG-GAI BP 

as b . Now, using b from the BP output, and the 
b(j), j = 1, • • • , iV - 1 from the RTS output in Step 1, 
reconstruct the symbol vector as 



;(0) 



E 



2J b(^). 



(36) 



• Step 4: Repeat Steps 1 to 3 using x' as the initial vector 
to the RTS algorithm. 

The algorithm is stopped after a certain number of itera- 
tions between RTS and BP. Our simulations showed that 
two iterations between RTS and BP are adequate to achieve 
good improvement; more than two iterations resulted in only 
marginal improvement for the system parameters considered 
in the simulations. Since the complexity of BP part of RTS-BP 
is less than that of the RTS part, the order of complexity of 
RTS-BP is same as that of RTS, 0{K'^nf). 

D. Simulation Results 

Figure [16] shows the BER performance of the proposed 
hybrid RTS-BP algorithm in comparison with those of the 
RTS algorithm and the GTA-BP algorithm in |i49J in 16 x 16 
V-BLAST MIMO with 16-QAM on a frequency selective 
channel with L = 6 equal energy multipath components and 




■A-GTA-BPin Ref.149] 
-6-RTS 

-fRTS-BP (Proposed) 
— SISOAWGN 



20 22 24 26 
received SNR (dB) 



Fig. 16. BER perfomiance comparison between the RTS-BP (proposed), 
RTS, and GTA-BP (in 149 1) in 16 X 16 V-BLAST MIMO with 16-QAM in 
MIMO-ISI channel with L = 6, K = 64, uniform power-delay profile. 



K = 64 data vectors per frame. Because of the improvement 
of the reliability of LSBs due to BP run on them, the RTS- 
BP algorithm achieves better performance compared to RTS 
algorithm without BP. Also, both RTS-BP and RTS algorithms 
perform better than the GTA-BP in ||49l . 

E. Complexity Reduction Using Selective BP 

In the proposed RTS-BP algorithm, the use of BP at the RTS 
output was done unconditionally. Whereas the use of BP can 
improve performance only when the RTS output is erroneous. 
So, the additional complexity due to BP can be avoided if BP 
is not carried out whenever the RTS output is error-free. To 
decide whether to use BP or not, we can use the knowledge of 
the simulated pdf of the ML cost of the RTS output vector, i.e., 
the pdf of Ml — ||r' — H'x'||. Figure [TT] shows the simulated 
pdf of Ml for a 32 X 32 V-BLAST MIMO system with 64- 
QAM at an SNR of 30 dB on flat fading (L ^ K = 1). From 
Fig. [17] it is seen that a comparison of the value of Mi with 
a suitable threshold can give an indication of the reliability of 
the RTS output. For example, the output is more likely to be 
erroneous if Mi > 12 in Fig. [TT] 

Based on the above observation, we modify the RTS-BP 
algorithm as follows. If Mi > 9, only then BP algorithm is 
used; otherwise, the RTS output is taken as the final output. 
The threshold 6 has to be carefully chosen to achieve good 
performance. It is seen that 9^0 corresponds to the case of 
unconditional RTS-BP, and 6 = oo corresponds to the case of 
RTS without BP. For 9 = oo, there is no additional complexity 
due to BP, but there is no performance gain compared to 
RTS. For 6 — 0, performance gain is possible compared to 
RTS, but BP complexity will be there for all realizations. So 
there exits a performance-complexity trade off as a function 
of 9. We illustrate this trade-off in Fig. [TS] for a 32 x 32 V- 
BLAST system with 64-QAM in flat fading. For this purpose, 
we define 'SNR gain' in dB for a given threshold 9 as the 
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Fig. 17. Simulated pdfs of Mi, the tVIL cost of the RTS output vector, in a 
32 X 32 V-BLAST MIMO system with 64-QAM and SNR = 30 dB on flat 
fading (L = K = 1). 
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Fig. 18. SNR gain versus complexity gain trade-off in selectively using BP 
as a function of 6* in a 32 X 32 V-BLAST MMO system with 64-QAM at a 
BER of 0.001 on flat fading (L = K = 1). 



improvement in SNR achieved by RTS with selective BP using 
threshold 6 to achieve an uncoded BER of 10^'^ compared to 
RTS without BR Likewise, we define 'complexity gain' for 
a given 9 as 10 log2^Q(/3), where /3 is the ratio of the average 
number of computations required to achieve 10^^ uncoded 
BER in unconditional RTS-BP and that in RTS with selective 
BP using threshold 6. In Fig. [18] we plot these two gains on 
the y-axis as a function of the threshold 9. From this figure, 
we can observe that for 9 values less than 4, there is not much 
complexity gain since such small threshold values invoke BP 
more often (i.e., the system behaves more like unconditional 
RTS-BP). Similarly, for 9 values greater than 14, the system 
behaves more like RTS without BP; i.e., the complexity gain is 
maximum but there is no SNR gain. Interestingly, for 9 values 
in the range 4 to 14, maximum SNR gain is retained while 
achieving significant complexity gain as well. 

VI. Conclusions 

In this paper, we demonstrated that belief propagation 
on graphical models including Markov random fields and 
factor graphs can be efficiently used to achieve near-optimal 
detection in large-dimension MIMO-ISI channels at quadratic 
and linear complexities in Krit- It was shown through sim- 
ulations that damping of messages/beliefs in the MRF BP 
algorithm can significantly improve the BER performance 
and convergence behavior The Gaussian approximation of 
interference we adopted in the factor graph approach is novel, 
which offered the attractive linear complexity in number of 
dimensions while achieving near-optimal performance in large 
dimensions. In higher-order QAM, iterations between a tabu 
search algorithm and the proposed FG-GAI BP algorithm was 
shown to improve the bit error performance of the basic tabu 
search algorithm. Although we have demonstrated the pro- 
posed algorithms in uncoded systems, they can be extended to 
coded systems as well, using either turbo equaUzation or joint 



processing of the entire coded symbol frame based on low- 
complexity graphical models. Finally, a theoretical analysis 
of the convergence behavior and the bit error performance of 
the proposed BP algorithms is challenging, and remains to be 
studied. 
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